Two of the main frameworks used for modeling information diffusions in theonline are epidemic models and Hawkes point processes. The former considerinformation as a viral contagion which spreads into a population of onlineusers, and employ tools initially developed in the field of epidemiology. Thelatter view individual broadcasts of information as events in a point processand they modulate the event rate according to observed (or assumed) socialprinciples; they have been broadly used in fields such as finance andgeophysics. Here, we study for the first time the connection between these twomature frameworks, and we find them to be equivalent. More precisely, the rateof events in the Hawkes model is identical to the rate of new infections in theSusceptible-Infected-Recovered (SIR) model when taking the expectation overrecovery events -- which are unobserved in a Hawkes process. This paves the wayto apply tools developed for one framework across the gap, to the otherframework. We make three further contributions in this work. First, we propose HawkesN,an extension of the basic Hawkes model, in which we introduce the notion offinite maximum number of events that can occur. Second, we show HawkesN toexplain real retweet cascades better than the current state-of-the-art Hawkesmodeling. The size of the population can be learned while observing thecascade, at the expense of requiring larger amounts of training data. Third, weemploy an SIR method based on Markov chains for computing the final sizedistribution for a partially observed cascade fitted with HawkesN. We proposean explanation to the generally perceived randomness of online popularity: thefinal size distribution for real diffusion cascades tends to have two maxima,one corresponding to large cascade sizes and another one around zero.
展开▼